As part of IBM Machine Learning : Exploratory Data Analysis for Machine Learning, this is a week-5 Honors project. This notebook is the proof of concept that is thought in this course. \ This is the official dataset released by the govt. of India based on the census 2001 and 2011 survey.
The data is of 35 Indian states and union territories. The literacy rate is spread across the major parameters - Overall, Rural and Urban. All the data is percentage of the total population of that state.
The data in this CSV file contains the data from the Govt. Of India website, regarding the literacy rate of the 35 states and union territories.There are 3 key fields, literacy rate overall, literacy rate urban and literacy rate rural. \ To download the dataset Click here
Understand the literacy rate in India and which states/UT's have the highest growth in terms of increased literacy rates.
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly as ply
pio.templates.default = "plotly_dark"
df = pd.read_csv("GOI.csv")
df.head(10)
| Category | Country/ States/ Union Territories Name | Literacy Rate (Persons) - Total - 2001 | Literacy Rate (Persons) - Total - 2011 | Literacy Rate (Persons) - Rural - 2001 | Literacy Rate (Persons) - Rural - 2011 | Literacy Rate (Persons) - Urban - 2001 | Literacy Rate (Persons) - Urban - 2011 | |
|---|---|---|---|---|---|---|---|---|
| 0 | Country | INDIA | 64.8 | 73.0 | 58.7 | 67.8 | 79.9 | 84.1 |
| 1 | State | Andhra Pradesh | 60.5 | 67.0 | 54.5 | 60.4 | 76.1 | 80.1 |
| 2 | State | Arunachal Pradesh | 54.3 | 65.4 | 47.8 | 59.9 | 78.3 | 82.9 |
| 3 | State | Assam | 63.3 | 72.2 | 59.7 | 69.3 | 85.3 | 88.5 |
| 4 | State | Bihar | 47.0 | 61.8 | 43.9 | 59.8 | 71.9 | 76.9 |
| 5 | State | Chhattisgarh | 64.7 | 70.3 | 60.5 | 66.0 | 80.6 | 84.0 |
| 6 | State | Goa | 82.0 | 88.7 | 79.7 | 86.6 | 84.4 | 90.0 |
| 7 | State | Gujarat | 69.1 | 78.0 | 61.3 | 71.7 | 81.8 | 86.3 |
| 8 | State | Haryana | 67.9 | 75.6 | 63.2 | 71.4 | 79.2 | 83.1 |
| 9 | State | Himachal Pradesh | 76.5 | 82.8 | 75.1 | 81.9 | 88.9 | 91.1 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36 entries, 0 to 35 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Category 36 non-null object 1 Country/ States/ Union Territories Name 36 non-null object 2 Literacy Rate (Persons) - Total - 2001 36 non-null float64 3 Literacy Rate (Persons) - Total - 2011 36 non-null float64 4 Literacy Rate (Persons) - Rural - 2001 36 non-null float64 5 Literacy Rate (Persons) - Rural - 2011 36 non-null float64 6 Literacy Rate (Persons) - Urban - 2001 36 non-null float64 7 Literacy Rate (Persons) - Urban - 2011 36 non-null float64 dtypes: float64(6), object(2) memory usage: 2.4+ KB
Note: There is a word Persons in the column names. My assumption is for every 100 Persons the literacy rate is documented.
We have data for two years 2011 and 2001 which have a difference of a decade between them. We can generate new attribute to see the percentage change in literacy rate over the decade.
df['Total - Per. Change'] = (df.loc[:,'Literacy Rate (Persons) - Total - 2011'] -
df.loc[:,'Literacy Rate (Persons) - Total - 2001'])/df.loc[:,'Literacy Rate (Persons) - Total - 2001']
df['Rural - Per. Change'] = (df.loc[:,'Literacy Rate (Persons) - Rural - 2011'] -
df.loc[:,'Literacy Rate (Persons) - Rural - 2001'])/df.loc[:,'Literacy Rate (Persons) - Total - 2001']
df['Urban - Per. Change'] = (df.loc[:,'Literacy Rate (Persons) - Urban - 2011'] -
df.loc[:,'Literacy Rate (Persons) - Urban - 2001'])/df.loc[:,'Literacy Rate (Persons) - Total - 2001']
The column names are too long, so I will remove characters before Total-'year', Rural-'year' and Urban-'year'
new_col=[]
for i in df.columns:
new_col.append(i.split('(Persons) - ')[-1])
df.columns=new_col
df.head()
| Category | Country/ States/ Union Territories Name | Total - 2001 | Total - 2011 | Rural - 2001 | Rural - 2011 | Urban - 2001 | Urban - 2011 | Total - Per. Change | Rural - Per. Change | Urban - Per. Change | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Country | INDIA | 64.8 | 73.0 | 58.7 | 67.8 | 79.9 | 84.1 | 0.126543 | 0.140432 | 0.064815 |
| 1 | State | Andhra Pradesh | 60.5 | 67.0 | 54.5 | 60.4 | 76.1 | 80.1 | 0.107438 | 0.097521 | 0.066116 |
| 2 | State | Arunachal Pradesh | 54.3 | 65.4 | 47.8 | 59.9 | 78.3 | 82.9 | 0.204420 | 0.222836 | 0.084715 |
| 3 | State | Assam | 63.3 | 72.2 | 59.7 | 69.3 | 85.3 | 88.5 | 0.140600 | 0.151659 | 0.050553 |
| 4 | State | Bihar | 47.0 | 61.8 | 43.9 | 59.8 | 71.9 | 76.9 | 0.314894 | 0.338298 | 0.106383 |
We have data of the whole country, the states and union territories. I am going to view the overall Literacy rates of the country and then we'll remove this from our dataset. So that it is easy for us to view and compare literacy rates amongst States/ Union Territories.
India = df[df['Category'] == 'Country'].T
India = India.iloc[2:8,:]
India.reset_index(inplace=True)
India.columns = ['Measure', 'Value']
India.loc[:,'Measure'] = India['Measure'].apply(lambda x : str(x).split(' -')[0])
India_2001 = India.iloc[[0,2,4],:]
India_2011 = India.iloc[[1,3,5],:]
fig = go.Figure(data=[
go.Bar(name='2001', x=India_2001['Measure'], y=India_2001['Value'], marker_color='rgb(55, 83, 109)'),
go.Bar(name='2011', x=India_2011['Measure'], y=India_2011['Value'], marker_color='rgb(26, 118, 255)')
])
fig.update_layout(yaxis_range=[0, 100],barmode='group', title='Overall Literacy Rate in India :',yaxis_title="Persons")
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.show()
We have three attributes for literacy rates: total, rural and urban. We'll take a look on each of them to see how they're distributed across the nation.
df = df.iloc[1:,:] #Removing data for India as a whole country.
df.rename(columns={'Country/ States/ Union Territories Name' :'States/ Union Territories'}, inplace = True)
df.head()
| Category | States/ Union Territories | Total - 2001 | Total - 2011 | Rural - 2001 | Rural - 2011 | Urban - 2001 | Urban - 2011 | Total - Per. Change | Rural - Per. Change | Urban - Per. Change | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | State | Andhra Pradesh | 60.5 | 67.0 | 54.5 | 60.4 | 76.1 | 80.1 | 0.107438 | 0.097521 | 0.066116 |
| 2 | State | Arunachal Pradesh | 54.3 | 65.4 | 47.8 | 59.9 | 78.3 | 82.9 | 0.204420 | 0.222836 | 0.084715 |
| 3 | State | Assam | 63.3 | 72.2 | 59.7 | 69.3 | 85.3 | 88.5 | 0.140600 | 0.151659 | 0.050553 |
| 4 | State | Bihar | 47.0 | 61.8 | 43.9 | 59.8 | 71.9 | 76.9 | 0.314894 | 0.338298 | 0.106383 |
| 5 | State | Chhattisgarh | 64.7 | 70.3 | 60.5 | 66.0 | 80.6 | 84.0 | 0.086553 | 0.085008 | 0.052550 |
df.sort_values(by='Total - 2001', inplace=True)
fig = go.Figure(data=[
go.Bar(name='2001', x=df['Total - 2001'], y=df['States/ Union Territories'], orientation='h', marker_color='rgb(255, 0, 96)'),
go.Bar(name='2011', x=df['Total - 2011'], y=df['States/ Union Territories'], orientation='h', marker_color='rgb(0, 223, 162)')
])
fig.update_layout(xaxis_range=[0, 100],barmode='group', title = 'Total Literacy Rate Across Nation :', yaxis_title = "States/ Union Territories",
xaxis_title = "Total Literacy Rate")
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.show()
lowest_2001 = df.sort_values(by=['Total - 2001']).head()
highest_2001 = df.sort_values(by=['Total - 2001']).tail()
fig = go.Figure(data = [
go.Bar(name = 'Lowest_2001', x=lowest_2001['Total - 2001'], y=lowest_2001['States/ Union Territories'],orientation='h', marker_color='rgb(246, 250, 112)'),
go.Bar(name = 'Highest_2001', x=highest_2001['Total - 2001'], y=highest_2001['States/ Union Territories'],orientation='h', marker_color='rgb(0, 121, 255)')
])
fig.update_layout(xaxis_range=[0, 100],barmode='group', title = ' Top 5 highest and lowest "Total literacy rate" in 2001 :', xaxis_title = "Total Literacy Rate in 2001",
yaxis_title = "States/ Union Territories")
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.show()
lowest_2011 = df.sort_values(by=['Total - 2011']).head()
highest_2011 = df.sort_values(by=['Total - 2011']).tail()
fig = go.Figure(data = [
go.Bar(name = 'Lowest_2011', x=lowest_2011['Total - 2011'], y=lowest_2011['States/ Union Territories'],orientation='h', marker_color='rgb(246, 250, 112)'),
go.Bar(name = 'Highest_2011', x=highest_2011['Total - 2011'], y=highest_2011['States/ Union Territories'], orientation='h', marker_color='rgb(0, 121, 255)')
])
fig.update_layout(xaxis_range=[0, 100],barmode='group', title = ' Top 5 highest and lowest "Total literacy rate" in 2011 :', xaxis_title = "Total Literacy Rate in 2001",
yaxis_title = "States/ Union Territories")
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.show()
px.bar(df.sort_values(by='Total - Per. Change'),
x='Total - Per. Change', y='States/ Union Territories',
color='Total - Per. Change', title='Totel Per. Change')
df.sort_values(by='Rural - 2001', inplace=True)
fig = go.Figure(data = [
go.Bar(name='2001', x=df['Rural - 2001'], y=df['States/ Union Territories'], orientation='h', marker_color='rgb(255, 0, 96)'),
go.Bar(name='2011', x=df['Rural - 2011'], y=df['States/ Union Territories'], orientation='h', marker_color='rgb(0, 223, 162)')
])
fig.update_layout(xaxis_range=[0, 100],barmode='group', title = 'Literacy rate in rural areas acorss the country :', yaxis_title = "States/ Union Territories",
xaxis_title = "Rural India Literacy Rate")
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.show()
lowest_2001 = df.sort_values(by=['Rural - 2001']).head()
highest_2001 = df.sort_values(by=['Rural - 2001']).tail()
fig = go.Figure(data = [
go.Bar(name = 'Lowest_2001', x=lowest_2001['Rural - 2001'], y=lowest_2001['States/ Union Territories'],orientation='h', marker_color='rgb(246, 250, 112)'),
go.Bar(name = 'Highest_2001', x=highest_2001['Rural - 2001'], y=highest_2001['States/ Union Territories'], orientation='h', marker_color='rgb(0, 121, 255)' )
])
fig.update_layout(xaxis_range=[0, 100],barmode='group', title = 'Top 5 highest and Lowest "Rural literacy rate" in 2001 :',
yaxis_title = "States/ Union Territories",
xaxis_title = "Rural India Literacy Rate in 2001")
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.show()
lowest_2011 = df.sort_values(by=['Rural - 2011']).head()
highest_2011 = df.sort_values(by=['Rural - 2011']).tail()
fig = go.Figure(data = [
go.Bar(name = 'Lowest_2011', x=lowest_2011['Rural - 2011'], y=lowest_2011['States/ Union Territories'],orientation='h', marker_color='rgb(246, 250, 112)'),
go.Bar(name = 'Highest_2011', x=highest_2011['Rural - 2011'], y=highest_2011['States/ Union Territories'], orientation='h', marker_color='rgb(0, 121, 255)' )
])
fig.update_layout(xaxis_range=[0, 100],barmode='group', title = 'Top 5 highest and Lowest "Rural literacy rate" in 2011 :',
yaxis_title = "States/ Union Territories",
xaxis_title = "Rural India Literacy Rate in 2011")
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green',zeroline=True, zerolinewidth=1.5, zerolinecolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green',zeroline=True, zerolinewidth=1.5, zerolinecolor='green')
fig.show()
px.bar(df.sort_values(by='Rural - Per. Change',ascending=True),
x='Rural - Per. Change', y='States/ Union Territories',
color='Rural - Per. Change', title='Rural Per. Change')
We have the same distribution of rural literacy rate among States/Union Territories as we saw in total literacy rate.
Bihar, Jharkhand, Jammu & Kashmir, D & N Haveli and Utter Pradesh have worked hard in their rural areas and thus they have highest percentage increrse in rural literacy rate.
Mizoram, Kerala, NCT of Delhi, Chandigarh and A & N Islands have least percentage increse in rural literacy rate.
The states that have worked the most in their rural areas are the ones which had least rural literacy rate in 2001.
df.sort_values(by='Urban - 2001', inplace=True)
fig = go.Figure(data = [
go.Bar(name='2001', x=df['Urban - 2001'], y=df['States/ Union Territories'], orientation='h', marker_color='rgb(255, 0, 96)'),
go.Bar(name='2011', x=df['Urban - 2011'], y=df['States/ Union Territories'], orientation='h', marker_color='rgb(0, 223, 162)')
])
fig.update_layout(xaxis_range=[0, 100],barmode='group', title = 'Literacy rate in urban areas acorss the country :', yaxis_title = "States/ Union Territories",
xaxis_title = "Urban India Literacy Rate")
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green')
fig.show()
lowest_2001 = df.sort_values(by=['Urban - 2001']).head()
highest_2001 = df.sort_values(by=['Urban - 2001']).tail()
fig = go.Figure(data = [
go.Bar(name = 'Lowest_2001', x=lowest_2001['Urban - 2001'], y=lowest_2001['States/ Union Territories'], orientation='h', marker_color='rgb(246, 250, 112)' ),
go.Bar(name = 'Highest_2001', x=highest_2001['Urban - 2001'], y=highest_2001['States/ Union Territories'],orientation='h', marker_color='rgb(0, 121, 255)' )
])
fig.update_layout(xaxis_range=[0, 100],barmode='group', title = 'Top 5 highest and Lowest "Urban literacy rate" in 2001 :',
yaxis_title = "States/ Union Territories",
xaxis_title = "Urban India Literacy Rate in 2001")
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green',zeroline=True, zerolinewidth=1.5, zerolinecolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green',zeroline=True, zerolinewidth=1.5, zerolinecolor='green')
fig.show()
lowest_2011 = df.sort_values(by=['Urban - 2011']).head()
highest_2011 = df.sort_values(by=['Urban - 2011']).tail()
fig = go.Figure(data = [
go.Bar(name = 'Lowest_2011', x=lowest_2001['Urban - 2011'], y=lowest_2011['States/ Union Territories'], orientation='h', marker_color='rgb(246, 250, 112)' ),
go.Bar(name = 'Highest_2011', x=highest_2001['Urban - 2011'], y=highest_2011['States/ Union Territories'],orientation='h', marker_color='rgb(0, 121, 255)' )
])
fig.update_layout(xaxis_range=[0, 100],barmode='group', title = 'Top 5 highest and Lowest "Urban literacy rate" in 2011 :',
yaxis_title = "States/ Union Territories",
xaxis_title = "Urban India Literacy Rate in 2011")
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green',zeroline=True, zerolinewidth=1.5, zerolinecolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green',zeroline=True, zerolinewidth=1.5, zerolinecolor='green')
fig.show()
px.bar(df.sort_values(by='Urban - Per. Change',ascending=True),
x='Urban - Per. Change', y='States/ Union Territories',
color='Urban - Per. Change', title='Urban Per. Change')
Note: for these parameters ['Total - Per. Change', 'Rural - Per. Change', 'Urban - Per. Change'] I want them in percentage. I will multiply the values by 100.
columns_to_multiply = ['Total - Per. Change', 'Rural - Per. Change', 'Urban - Per. Change']
df[columns_to_multiply] = df[columns_to_multiply] * 100
df.head()
| Category | States/ Union Territories | Total - 2001 | Total - 2011 | Rural - 2001 | Rural - 2011 | Urban - 2001 | Urban - 2011 | Total - Per. Change | Rural - Per. Change | Urban - Per. Change | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 26 | State | Uttar Pradesh | 56.3 | 67.7 | 52.5 | 65.5 | 69.8 | 75.1 | 20.248668 | 23.090586 | 9.413854 |
| 4 | State | Bihar | 47.0 | 61.8 | 43.9 | 59.8 | 71.9 | 76.9 | 31.489362 | 33.829787 | 10.638298 |
| 10 | State | Jammu & Kashmir | 55.5 | 67.2 | 49.8 | 63.2 | 71.9 | 77.1 | 21.081081 | 24.144144 | 9.369369 |
| 1 | State | Andhra Pradesh | 60.5 | 67.0 | 54.5 | 60.4 | 76.1 | 80.1 | 10.743802 | 9.752066 | 6.611570 |
| 22 | State | Rajasthan | 60.4 | 66.1 | 55.3 | 61.4 | 76.2 | 79.7 | 9.437086 | 10.099338 | 5.794702 |
temp_1 = df.groupby(by=['Category'])['Total - 2001'].mean().reset_index().T
temp_2 = df.groupby(by=['Category'])['Total - 2011'].mean().reset_index().T
temp_3 = df.groupby(by=['Category'])['Rural - 2001'].mean().reset_index().T
temp_4 = df.groupby(by=['Category'])['Rural - 2011'].mean().reset_index().T
temp_5 = df.groupby(by=['Category'])['Urban - 2001'].mean().reset_index().T
temp_6 = df.groupby(by=['Category'])['Urban - 2011'].mean().reset_index().T
frames = [temp_1, temp_2, temp_3, temp_4, temp_5, temp_6]
temp = pd.concat(frames)
loc = [0,1,3,5,7,9,11]
temp = temp.iloc[loc,:]
temp = temp.iloc[1:,:]
temp.reset_index(inplace=True)
temp.columns=['Category','State','Union Territory']
fig = go.Figure(data = [
go.Bar(name='States', y=temp['Category'], x=temp['State'], orientation='h', marker_color='rgb(26, 118, 255)'),
go.Bar(name='Union Territories', y=temp['Category'], x=temp['Union Territory'], orientation='h', marker_color='rgb(55, 83, 109)')
])
fig.update_layout(barmode='group')
fig.show()
Average Literacy rate in union territories have always been greater than that of states in every category.
An interactive visualization of all states/ union territories and the observations.
df1 = pd.melt(df, id_vars='States/ Union Territories', value_vars=['Total - 2001', 'Total - 2011',
'Rural - 2001', 'Rural - 2011', 'Urban - 2001', 'Urban - 2011','Total - Per. Change', 'Rural - Per. Change', 'Urban - Per. Change'])
fig = px.bar(df1, 'variable', 'value', animation_frame='States/ Union Territories',color='value',
color_discrete_sequence='Viridis',title='Literacy Rate of each State/ Union Territory.'
)
fig.update_layout(yaxis_range=[0, 100],xaxis_title = 'State/ Union Territory' )
fig.update_traces(marker_line_color='rgb(8,48,107)', marker_line_width=1.5,texttemplate='%{value}', textposition='outside')
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green',zeroline=True, zerolinewidth=1.5, zerolinecolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green',zeroline=True, zerolinewidth=1.5, zerolinecolor='green')
fig.show()
Null Hypothesis ($H_{o}$) : There is no significant difference between 2001 and 2011 in Total Literacy rate across India.
# Assuming 'df' is your DataFrame
data_2001 = df['Total - 2001']
data_2011 = df['Total - 2011']
# Perform a paired t-test
t_stat, p_value = stats.ttest_rel(data_2001, data_2011)
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis. There is a significant difference between 2001 and 2011.")
else:
print("Fail to reject the null hypothesis. There is no significant difference between 2001 and 2011.")
Reject the null hypothesis. There is a significant difference between 2001 and 2011.
# Assuming 'df' is your DataFrame
data_2001 = df['Total - 2001']
data_2011 = df['Total - 2011']
# Perform a paired t-test
t_stat, p_value = stats.ttest_rel(data_2001, data_2011)
alpha = 0.05
# Create a DataFrame to hold the means and confidence intervals
results = pd.DataFrame({'Year': ['2001', '2011'],
'Mean': [data_2001.mean(), data_2011.mean()],
'CI_Lower': [data_2001.mean() - 1.96 * data_2001.std(), data_2011.mean() - 1.96 * data_2011.std()],
'CI_Upper': [data_2001.mean() + 1.96 * data_2001.std(), data_2011.mean() + 1.96 * data_2011.std()]})
# Create a Plotly figure
fig = px.bar(results, x='Year', y='Mean', error_y='CI_Upper', error_y_minus='CI_Lower', title='Comparison of Means (2001 vs. 2011)')
# Add a line to indicate the significance level
if p_value < alpha:
fig.add_shape(type="line",
x0=-0.5, x1=1.5, y0=data_2001.mean() + 0.05, y1=data_2001.mean() + 0.05,
line=dict(color="red"), name="Significance Level")
fig.add_annotation(x=0.35, y=data_2001.mean() + 7, text="Significance Level", showarrow=False, font=dict(color="green",size=16)
)
fig.add_annotation(x=1, y=data_2001.mean() + 150, text=f"p-value: {p_value:.3f}", showarrow=False, font=dict(color="orange", size=14))
fig.add_annotation(x=0.04, y=data_2001.mean() + 150, text=f"alpha: {alpha:.3f}", showarrow=False, font=dict(color="yellow", size=14))
fig.update_traces(marker_line_color='rgb(8,48,107)', marker_line_width=1.5,texttemplate='mean=%{y:.2f}',
textposition='inside', insidetextanchor='start', textfont=dict(color='black',size=14))
fig.update_layout(xaxis_title = 'Total Literacy rate (2001-2011)' )
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='green',zeroline=True, zerolinewidth=1.5, zerolinecolor='green')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='green',zeroline=True, zerolinewidth=1.5, zerolinecolor='green')
fig.show()
A p-value of 0 in a hypothesis test typically means that the test found extremely strong evidence against the null hypothesis. In the context of a paired t-test comparing two groups, it suggests that there is a significant difference between the two groups being compared.
Specifically, in this case, where I am comparing data from 2001 and 2011, a p-value of 0 means that there is very strong statistical evidence to conclude that the means of the two groups (2001 and 2011) are significantly different. In other words, the data from 2001 and 2011 are not just different by chance; the difference is highly significant.